3 research outputs found

    A database and digital signal processing framework for the perceptual analysis of voice quality

    Get PDF
    Bermúdez de Alvear RM, Corral J, Tardón LJ, Barbancho AM, Fernández Contreras E, Rando Márquez S, Martínez-Arquero AG, Barbancho I. A database and digital signal processing framework for the perceptual analysis of voice quality. Pan European Voice Conferenc: PEVOC 11 Abstract Book. Aug. 31-Sept.2, 2015.Introduction. Clinical assessment of dysphonia relies on perceptual as much as instrumental methods of analysis [1]. The perceptual auditory analysis is potentially subject to several internal and external sources of bias [2]. Furthermore acoustic analyses which have been used to objectively characterize pathological voices are likely to be affected by confusion variables such as the signal processing or the hardware and software specifications [3]. For these reasons the poor correlation between perceptual ratings and acoustic measures remains to be a controversial matter [4]. The availability of annotated databases of voice samples is therefore of main importance for clinical and research purposes. Databases to perform digital processing of the vocal signal are usually built from English speaking subjects’ sustained vowels [5]. However phonemes vary from one language to another and to the best of our knowledge there are no annotated databases with Spanish sustained vowels from healthy or dysphonic voices. This work shows our first steps to fill in this gap. For the aim of aiding clinicians and researchers in the perceptual assessment of voice quality a two-fold objective was attained. On the one hand a database of healthy and disordered Spanish voices was developed; on the other an automatic analysis scheme was accomplished on the basis of signal processing algorithms and supervised learning machine techniques. Material and methods. A preliminary annotated database was created with 119 recordings of the sustained Spanish /a/; they were perceptually labeled by three experienced experts in vocal quality analysis. It is freely available under Links in the ATIC website (www.atic.uma.es). Voice signals were recorded using a headset condenser cardioid microphone (AKG C-544 L) positioned at 5 cm from the speaker’s mouth commissure. Speakers were instructed to sustain the Spanish vowel /a/ for 4 seconds. The microphone was connected to a digital recorder Edirol R-09HR. Voice signals were digitized at 16 bits with 44100 Hz sampling rate. Afterwards the initial and last 0.5 second segments were cut and the 3 sec. mid portion was selected for acoustic analysis. Sennheiser HD219 headphones were used by judges to perceptually evaluate voice samples. To label these recordings raters used the Grade-Roughness-Breathiness (GRB) perceptual scale which is a modified version of the original Hirano’s GRBAS scale, posteriorly modified by Dejonckere et al., [6]. In order to improve intra- and inter-raters’ agreement two types of modifications were introduced in the rating procedure, i.e. the 0-3 points scale resolution was increased by adding subintervals to the standard 0-3 intervals, and judges were provided with a written protocol with explicit definitions about the subintervals boundaries. By this way judges could compensate for the potential instability that might occur in their internal representations due to the perceptual context influence [7]. Raters’ perceptual evaluations were simultaneously performed by means of connecting the Sennheiser HD219 headphones to a multi-channel headphone preamp Behringer HA4700 Powerplay Pro-XL. The Yin algorithm [8] was selected as initial front-end to identify voiced frames and extract their fundamental frequency. For the digital processing of voice signals some conventional acoustic parameters [6] were selected. To complete the analysis the Mel-Frequency Cepstral Coefficients (MFCC) were further calculated because they are based on the auditory model and they are thus closer to the auditory system response than conventional features. Results. In the perceptual evaluation excellent intra-raters agreement and very good inter-raters agreement were achieved. During the supervised machine learning stage some conventional features were found to attain unexpected low performance in the classification scheme selected. Mel Frequency Cepstral Coefficients were promising for assorting samples with normal or quasi-normal voice quality. Discussion and conclusions. Despite it is still small and unbalanced the present annotated data base of voice samples can provide a basis for the development of other databases and automatic classification tools. Other authors [9, 10, 11] also found that modeling the auditory non-linear response during signal processing can help develop objective measures that better correspond with perceptual data. However highly disordered voices classification remains to be a challenge for this set of features since they cannot be correctly assorted by either conventional variables or the auditory model based measures. Current results warrant further research in order to find out the usability of other types of voice samples and features for the automatic classification schemes. Different digital processing steps could be used to improve the classifiers performance. Additionally other types of classifiers could be taken into account in future studies. Acknowledgment. This work was funded by the Spanish Ministerio de Economía y Competitividad, Project No. TIN2013-47276-C6-2-R has been done in the Campus de Excelencia Internacional Andalucía Tech, Universidad de Málaga. References [1] Carding PN, Wilson JA, MacKenzie K, Deary IJ. Measuring voice outcomes: state of the science review. The Journal of Laryngology and Otology 2009;123,8:823-829. [2] Oates J. Auditory-perceptual evaluation of disordered voice quality: pros, cons and future directions. Folia Phoniatrica et Logopaedica 2009;61,1:49-56. [3] Maryn et al. Meta-analysis on acoustic voice quality measures. J Acoust Soc Am 2009; 126, 5: 2619-2634. [4] Vaz Freitas et al. Correlation Between Acoustic and Audio-Perceptual Measures. J Voice 2015;29,3:390.e1 [5] “Multi-Dimensional Voice Program (MDVP) Model 5105. Software Instruction Manual”, Kay PENTAX, A Division of PENTAX Medical Company, 2 Bridgewater Lane, Lincoln Park, NJ 07035-1488 USA, November 2007. [6] Dejonckere PH, Bradley P, Clemente P, Cornut G, Crevier-Buchman L, Friedrich G, Van De Heyning P, Remacle M, Woisard V. A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Comm. on Phoniatrics of the European Laryngological Society (ELS). Eur Arch Otorhinolaryngol 2001;258:77–82. [7] Kreiman et al. Voice Quality Perception. J Speech Hear Res 1993;36:21-4 [8] De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Amer. 202; 111,4:1917. [9] Shrivastav et al. Measuring breathiness. J Acoust Soc Am 2003;114,4:2217-2224. [10] Saenz-Lechon et al. Automatic Assessment of voice quality according to the GRBAS scale. Eng Med Biol Soc Ann 2006;1:2478-2481. [11] Fredouille et al. Back-and-forth methodology for objective voice quality assessment: from/to expert knowledge to/from automatic classification of dysphonia. EURASIP J Appl Si Pr 2009.Campus de Excelencia Internacional Andalucía Tech, Universidad de Málaga. Ministerio de Economía y Competitividad, Projecto No. TIN2013-47276-C6-2-R

    Valoración del AVQI (Acoustic Voice Quality Index) como medida de la severidad de la disfonía en castellano

    Get PDF
    Objetivos El Índice de Calidad Acústica de la Voz (AVQI) es un método objetivo para cuantificar la severidad de la disfonía basándose en el análisis de la voz continua y en una vocal sostenida. El objetivo de este estudio es validar el AVQI en castellano y estudiar su precisión diagnóstica. Metodología Se ha estudiado un grupo control 24 sujetos y un grupo de 37 sujetos disfónicos. Cada muestra de voz fue analizada perceptualmente por tres jueces para obtener el grado de severidad global de la disfonía (G). Se calculó la concordancia intra- e interjuez (ICC). La validez externa del AVQI se obtuvo mediante el análisis de la correlación de Spearman existente entre dicho índice y el parámetro perceptual G. Para estudiar la precisión diagnóstica del AVQI se empleó la curva ROC y se estimó su sensibilidad, especificidad y los cocientes de probabilidad positivo LR (+) y negativo LR (-). El análisis acústico se realizo mediante el programa PRATT utilizando el algoritmo AVQI2 de Maryn et al. Resultados Los ICC intrajuez fueron muy altos (ICC > 0,940), al igual que el ICC interjuez (0,986). La curva ROC reveló una excelente precisión diagnóstica (área bajo la curva: 0,94). El punto de corte en la presente investigación para el AVQI fue 3,20 con una sensibilidad del 0,838 y una especificidad de 0,917; LR (+)= 10,10 y LR (-)= 0,10. Comentario y Conclusiones Se ha demostrado la validez del AVQI en castellano y su precisión diagnóstica para diferenciar entre voces sanas y patológicas.Universidad de Málaga. Campus de Excelencia Internacional. Andalucia Tech

    Perfil de uso vocal en el profesorado de los colegios públicos de Málaga

    Get PDF
    Con objeto de valorar el estado actual de las exigencias vocales que demanda la profesión docente y poder prevenir sus riesgos laborales, hemos presentado un estudio epidemiológico sobre los rasgos que caracterizan la utilización de la voz de 244 maestros pertenecientes a 39 colegios públicos de Málaga capital. Para la recogida de datos hemos empleado dos cuestionarios de autovaloración. 1- Una encuesta dirigida a conocer el perfil de uso vocal durante todas las actividades cotidianas, los factores de riesgo coadyuvantes, el tipo de patología vocal y las consecuencias derivadas de todo ello a nivel clínico y laboral. 2- El test MBI o inventario BURNOUT de Maslach, cuyo objetivo implícito es valorar el nivel de estrés de estos profeisonales. Mediante el paquete estadístico SPSS, hemos procedido, enprimer lugar, al estudio descriptivo de las variables antedichas. En una segunda instancia nos hemos centrado en el análisis inferencial de las relaciones significativas que aparecenentre todas ellas. De modo que fianalmente hemos podido caracterizar 3 tipos de patrones de uso vocal en esta población docente, deiferenciadolos sobre la base de la intensidad de voz, la tensión muscular, las características somáticas, el perfil profesional, el nivel de ruido medioambiental, el grado de estrés, la fonastenia, las parestesias faringolaríngeas, las necesidades asistenciales y las ausencias laborales. Nos parece que nuestras conclusiones aportan información útil para el diseño de programas preventivos basados en las evidencias que hemos obtenid
    corecore